--- title: A/B Testing keywords: fastai sidebar: home_sidebar summary: "Recipes for A/B Testing" description: "Recipes for A/B Testing" nb_path: "nbs/03_modeling/ab_testing.ipynb" ---
pairs = {'Anything possible': (1,1),
'Maybe 50%': (10,10),
'Likely 50%': (100,100),
'Almost certainly 50%': (1000,1000),
'Maybe 20%': (2, 8),
'Likely 20%': (20, 80),
'Almost certainly 20%': (200,800)}
x = np.linspace(0, 1, 1000)
df = pd.DataFrame({f'{name}: ({a=}, {b=})': beta(a,b).pdf(x) for name, (a,b) in pairs.items()}, index=x)
fig = px.line(df, x=df.index, y=df.columns)
Here is an example where we believe our success-to-failure rate is 4:16
success_prior = 4
failure_prior = 16
prior = beta(success_prior, failure_prior)
x = np.linspace(0, 1, 1000)
df = pd.DataFrame({'prior': prior.pdf(x)}, index=x)
fig = px.line(df, x=df.index, y=df.columns)
n = 200
experiments = {name: np.random.rand(n) < np.random.uniform(0.15, 0.25) for name in ['A', 'B', 'C']}
metrics = {name: {'success': results[results].sum(),
'failure': (~results[~results]).sum()} for name, results in experiments.items()}
posteriors = {name: beta(success_prior + metrics[name]['success'],
failure_prior + metrics[name]['failure']) for name in experiments}
x = np.linspace(0, 1, 1000)
df = df.assign(**{f'{name}_posterior': posterior.pdf(x) for name, posterior in posteriors.items()})
fig = px.line(df, x=df.index, y=df.columns)
n = 100_000
simulations = {name: posterior.rvs(n) for name, posterior in posteriors.items()}
result = (simulations['B'] / simulations['A'])
fig = px.histogram(result)
fig.add_vline(x=1, line_color='red', line_width=3, line_dash='dash');
pseudo_pvalue = {f'{name1}_gt_{name2}': (s1 > s2).sum() / len(s1)
for name1, s1 in simulations.items()
for name2, s2 in simulations.items() if name1 != name2}
pseudo_pvalue['B_gt_A']
df = pd.read_csv('https://raw.githubusercontent.com/alenyeh1014/DataAnalytics-AB_Testing/master/DataFiles/ab_data.csv')
df.head(3)
df2 = df.pivot_table(index='group', columns='converted', aggfunc='count', values='user_id')
df2.sum(axis=1)
(df2.T / df2.sum(axis=1)).T
Some products have limited resources (e.g. fixed supply of Uber drivers. As the number of Uber drivers with treatment increases, the value per driver decreases)
To control for interference, segment users: